refactor(ce-code-review): lean apply model — safe self-apply interactive, report-only for callers#881
Conversation
…wers Remove DHH/Kieran stack reviewers, fold structural checks into maintainability, suppress P3 findings in synthesis, and merge migration/schema-drift agents into a single conditional data-migration reviewer. Co-authored-by: Cursor <cursoragent@cursor.com>
Extend data-migration Step 0 to diff db/structure.sql when present, and route plan deepening migration risks to ce-data-integrity-guardian instead of the PR-review persona. Co-authored-by: Cursor <cursoragent@cursor.com>
Re-enable P3 findings in reports and synthesis after the persona refactor temporarily suppressed them. Register removed migration and stack reviewers in LEGACY_ONLY_AGENT_DESCRIPTIONS and EXTRA_LEGACY ce-* agent paths so upgrades sweep stale flat installs. Co-authored-by: Cursor <cursoragent@cursor.com>
Register removed review personas in STALE_AGENT_NAMES and LEGACY_ONLY_AGENT_DESCRIPTIONS, and resolve current-agent lookup for names that already include the ce- prefix so cleanupStaleAgents removes flat installs like ce-dhh-rails-reviewer.md. Co-authored-by: Cursor <cursoragent@cursor.com>
…callers Orchestrated workflows now invoke review for findings and own fix application separately, so ce-work and lfg can batch fixes without the review skill mutating the checkout. Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve ce-code-review Stage 6 conflict by keeping review-only output (no Applied Fixes section, JSON schema for mode:agent). Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0346beae3b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5391c57851
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…te (#881) - Duplicate review-findings-followup into ce-work-beta for isolated installs - Inline LFG apply bar in lfg/references/review-followup.md - Replace stale headless text envelope with JSON contract in review-output-template Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9bee6ec5cc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Return status skipped in mode:agent when PR pre-checks bail out. In pr-remote scope, fetch PR head ref and forbid workspace Read/Grep for changed files; validators use git show or diff hunks only. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a2ec295fb5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Add branch-remote inspection rules for no-checkout branch reviews, use local tree diffs instead of appending gh pr diff when aligned on the PR branch, and sync ce-work-beta residual gate sentinels with ce-work. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3f22822d9b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…#881) Two remaining gaps in the review-scope model surfaced by Codex review of 3f22822, both causing findings to be validated against the wrong tree: - PR scope classification trusted the branch name alone, so a fork PR (or a stale local branch) sharing a name with the PR head was treated as local-aligned and diffed/inspected against the local checkout. Now local-aligned also requires the PR to be same-repo (isCrossRepository false) and the PR head commit to be an ancestor of local HEAD; metadata fetch gains headRefOid + isCrossRepository. - Stage 5b validators never received the scope mode or remote head ref, so in pr-remote/branch-remote they fell back to Read/Grep on the stale workspace. Now the scope mode and <pr-head-ref>/<branch-head-ref> are injected into validator prompts (mirroring Stage 4), with inspection access scoped by mode. The validator template already consumed these. Note: pre-existing CLI install test failures (tests/cli.test.ts) are unrelated to this change and present on HEAD before it.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bc9d30260a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ew in apply followup Address PR review feedback (#881): - ce-code-review: Stage 2 branch-mode intent discovery now runs `git log ${BASE}..<branch-ref>` using the resolved ref instead of the raw `<branch>` argument, so remote-only branch reviews read the right commit intent instead of failing or reading a stale same-named local branch. - ce-work / ce-work-beta apply followup: consume the review already produced by Tier 2 step 2a instead of re-invoking ce-code-review; re-invocation is now a documented cold-caller fallback. Avoids a second full review per Tier 2 run. Updated the ce-work SKILL.md section anchor and the contract test to lock in the corrected (no-double-review) behavior. Note: pre-existing failures in CLI install/cleanup/target-detection tests (47, unrelated to these skill-content changes) not addressed by this PR.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 633576a3f9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…gs-table output ce-code-review hardening prompted by a dogfood run that shipped findings in the output template's own documented anti-pattern format and skipped validation on the P1 findings: - Stage 5b: resolve the "skip when >15 survivors" vs "validate top-15" contradiction. The stage no longer skips while any finding survives, and P0/P1 findings are always validated (raise the cap rather than drop a critical); only the P2/P3 tail is dropped when over budget. - Stage 5b: sanction orchestrator direct verification of load-bearing facts (a pinned dependency's source, repo wiring) as a complement to the validator wave, using single-purpose native tools instead of chained shell. - Stage 6: inline a literal findings-table skeleton in the always-loaded SKILL.md plus a concrete negative-signature tripwire (no Field: blocks, no box-drawing/middot separators, no lists) so the output contract survives a long session even when the template reference was not reloaded. - Output format: state that mode:agent JSON is the deterministic cross-harness machine contract and default markdown is the human view; keep markdown ASCII-safe so it degrades gracefully across terminals and harnesses. - Lock the P0/P1-always-validated invariant into the Stage 5b contract test.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0b4baf792c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The human-facing per-severity tables are now 5 columns (# | File | Issue | Reviewer | Confidence). The synthesized route (autofix_class -> owner) moves to the Actionable Findings table and the mode:agent JSON, where it is actionable — keeping the scannable severity tables narrow enough to render across terminals and non-Claude harnesses. The finding set, routing, confidence gating, and JSON contract are unchanged; this is presentation only. Updates the Stage 6 inline skeleton and column description, the output-template examples and formatting rules, the numbering fixture, and the primary-findings regex in the contract test.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 729b7fef99
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Re-introduce auto-fixing into ce-code-review as a lightweight, judgment-based act policy (not the removed mode:autofix machinery). In default (interactive) mode the review now applies the safe, verified fixes it is confident in and reports them in an Applied section; in mode:agent it stays report-only and the caller applies. It never commits or pushes — the human/caller owns permanence. - Stage 5c "Act on findings (default mode only)": bias-to-act policy (apply clear reversible improvements; push back if the reviewer is wrong; defer what needs a decision), scope invariant (local-aligned/standalone only), verify-then-keep (revert on red), no auto-commit, and prominent surfacing of green-but-unverifiable edits (auth/contract/concurrency). No deny-list: a code-review fix is a reversible edit, so downside is controlled after the fact (revert + visible diff + commit checkpoint). - Stage 6 + output template: Applied section (# | File | Fix | Reviewer + validation line); applied findings keep their stable # and stay out of the severity tables. - Operating principles / Action Routing / autofix deprecation reframed: autofix_class is signal, never an apply gate; mode:agent never mutates. - ce-work and ce-work-beta apply step reframed from a conservative "when unsure, skip" eligibility filter to the same bias-to-act policy. - Contract test + numbering fixture cover the new behavior; skill doc updated (also corrected pre-existing drift: removed safe_auto and four-modes framing). Plan: docs/plans/2026-06-02-001-feat-ce-code-review-safe-autofix-plan.md
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee7891edd3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ix change Simplification pass over ee7891e found stale/contradictory prose the feature commit missed: - SKILL.md "After Review": dropped the "Review-only handoff / do not edit project files" framing that contradicts Stage 5c default-mode apply; scoped apply vs report-only by mode. - ce-work / ce-work-beta followup: scoped the "review-only / does not mutate" claim to the mode:agent invocation (true for the caller path) rather than asserting it of the skill as a whole. - docs/skills + output template: removed the stale safe_auto promotion bullet and the deprecated report-only Mode-line value. - contract test: de-duplicated the mutate-contract assertions across two tests (each invariant asserted once in its relevant test) and fixed a misleading comment. Behavior preserved: contract tests 28/0; full suite at the 47 pre-existing failures, zero new.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cbc343925e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…er-time template load Dogfood run showed the findings format still escaping to field-blocks for detail-heavy P1s (while P2/P3 stayed tables) and a duplicate finding number on a multi-file applied fix. Make the canonical output skeleton accommodate detail so the format stops fighting the content: - Findings: terse Issue cell (the scannable index) + a keyed detail line (`- **#N** — ...`) under each severity table for findings that need depth (usually P0/P1). One shape for every severity. - review-output-template.md is the canonical copy-this skeleton; Stage 6 now instructs loading and mirroring it at render time (not just "see the template"), with the inline stub as the always-loaded fallback. - Format gate now catches the real failure (inconsistent treatment across severities) and explicitly does not flag the keyed detail line. - Applied section: a multi-file fix is one row with one # (no duplicate numbers); green-but-unverifiable edits flagged inline in the Fix cell. - Tests + fixture cover the detail line, the multi-file one-row applied fix, and the consistency/load contract. Behavior preserved: contract tests 29/0; full suite at the 47 pre-existing failures, zero new.
The latest dogfood run kept the terse-cell + detail-line format (good) but still packed full sentences into several P2 Issue cells. Replace the vague "terse" guidance with a checkable bar: one short clause (~12 words or fewer, no second sentence, no because/so/which explanation); the moment a cell wants a comma-plus-clause or a reason, move the depth to the keyed detail line. Applied in the SKILL.md Stage 6 skeleton + Findings step and the output template; contract test asserts the named test is present.
…e commit Refine the interactive apply model after design review. The permanence gate is the push, not the commit: a local commit is private and reversible, and leaving applied fixes uncommitted on a clean tree just creates a chore plus a window where the user's next edits entangle with them. - Stage 5c: when the working tree was clean before the review, commit the applied fixes as one isolated `fix(review):` commit; on a dirty tree apply but leave them for the user's commit (the fixes can't be isolated from WIP). Never push, open PRs, or file tickets. - Operating principle, After Review, the Applied report step, the output template, the skill doc, and the plan all updated to commit-on-clean / never-push (was "never commits or pushes"). - Contract tests updated to the new contract. bun test: contract 29/0; full suite at the 47 pre-existing failures, zero new.
… contract Address 7 P2 review threads, several of them fallout from this branch's apply-model changes: - mode:agent is now described as report-only (skips Stage 5c apply), not "serialization only" — a runner could otherwise apply during mode:agent and the caller would re-apply the same findings. - Deprecated mode:autofix removed from the conflict-stop list so it no longer contradicts the deprecated-mode "ignore and proceed" path. - Agent-mode output: the markdown Actionable Findings summary is default-mode only; mode:agent carries actionable findings solely in the JSON field and emits nothing after the object. - Agent-mode JSON: require one raw (unfenced) JSON object so naive JSON.parse callers don't choke on a leading code fence. - pr-remote scope: resolve a real PR_BASE_REF (fetch baseRefName + rev-parse) and pass it to reviewers/validators so the data-migration persona's git diff <base> -- db/schema.rb schema-drift check works on remote PRs. - ce-optimize: inline a local mechanical-apply bar instead of referencing ce-work's review-findings-followup.md (skills must be self-contained). - lfg review-followup: make the post-fix push upstream-aware so a fresh branch with no upstream doesn't break the autonomous pipeline. Contract test updated for the report-only reframe. bun test: contract 29/0; full suite at the 47 pre-existing failures, zero new.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 89cff2103f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
PR #881 feedback: Stage 5b dropped a finding as "validator failed" on any validator infrastructure failure (timeout / dispatch error / malformed JSON). For P0/P1 that silently removes the highest-severity issues from the report/actionable JSON on a transient failure — the opposite of safe, and now that the validation pass always runs (default + mode:agent) it's reachable. Now: P2/P3 still drop on infra failure (conservative bias); P0/P1 are kept and their validation marked degraded (reported in Coverage). A genuine validated:false rejection still drops at any severity. bun test: contract 29/0; full suite at the 47 pre-existing failures, zero new.
What this does
ce-code-reviewdoes auto-fixes again — but lean. The old posture bolted fixing onto reviewing through a heavyweightmode:autofixmode, asafe_autopermission tier, and in-skill walkthrough / bulk-preview / ticket-defer machinery. This PR throws that machinery out and replaces it with a short, judgment-based act policy, organized around who owns the working tree:/ce-code-review): the review applies the safe, verified fixes it's confident in — test hardening, mechanical fixes — and reports them in an Applied section. When your working tree was clean before the review it commits them as an isolatedfix(review):commit; on a dirty tree it leaves them for your commit (they fold into the work you were already committing). It never pushes. Safe fixes come off your plate — returned to a clean state, not left as a chore — and you stay the gate on the push.mode:agent(a caller likece-work/lfgruns it): the review is report-only — it mutates nothing and hands findings to the caller, which owns apply and the Residual Work Gate. Pipelines are no longer surprised by mid-run tree mutations.Apply policy: bias to act on clear, reversible improvements; push back (don't apply) when a reviewer is wrong; defer anything that needs a design decision. The downside is controlled after the fact — reversible edits (a local commit reverts cleanly too), a visible diff, verify-then-keep (revert on red), and the push being the one outward step you own — not by a permission tier. Apply happens only on the tree that was reviewed (
local-aligned/ standalone), never on a remote-PR diff.How it materially changed from the old posture
mode:autofixmode +safe_autopermission tier; apply gated by classsafe_autoremaps togated_autoSKILL.md898 lines + ~510 lines of apply-machinery referencesSKILL.md693 lines (−205 / −23%); machinery references deleted (walkthrough−249,tracker-defer−149,bulk-preview−112). Net the skill is ~690 lines lighter even after adding the new act policy + output skeletonSKILL.md; triage spun up walkthrough/preview/defer flowsmode:agentJSON contract; one ASCII-safe table shape (terse cell + keyed detail line) for every severityNet: the review is simpler, lighter, and faster to load, and it still takes the safe, tedious fixes off your plate — via a lean policy instead of a heavy mode.
Apply model (detail)
mode:autofixcould edit the checkoutmode:agentreports onlysafe_autotier + opaque in-skill autofixsafe_autoremaps togated_autoce-work/lfg)mode:agentJSON — one parseable object +review.jsonartifactfix(review):on a clean tree, left for your commit on a dirty one); report-only inmode:agentThe review never pushes, opens PRs, or files tickets in any mode; in interactive mode it commits its applied fixes when the tree was clean before the review (the permanence gate is the push, not the local commit).
mode:headlessis a deprecated alias formode:agent;mode:report-onlyis ignored.Output quality & reliability
testing_gaps/residual_risks).Issuecell + keyed detail line, the same shape for every severity, ASCII-safe pipe tables (no box-drawing,->not unicode). The canonical skeleton lives inreferences/review-output-template.mdand is loaded and mirrored at render time.mode:agentJSON is the deterministic surface for automation; the markdown report is the human view.Orchestrator changes (
ce-work,ce-work-beta,lfg)The callers own apply, with a bias-to-act step (apply clear fixes; defer what needs a decision — never a conservative "when unsure, skip" filter):
ce-code-review mode:agent→ (2b)review-findings-followup.mdapplies findings (batched by file) → Residual Work Gate.lfgmirrors the split with a self-containedlfg/references/review-followup.md;ce-work-betacarries its own copy ofreview-findings-followup.md(skills install as isolated directories — no cross-skill references).Files
ce-code-review:references/walkthrough.md,references/bulk-preview.md,references/tracker-defer.md(the in-skill apply machinery)references/action-class-rubric.md; caller-ownedreview-findings-followup.mdunderce-work/ce-work-betace-code-review/SKILL.md+references/review-output-template.mdTest plan
bun test tests/review-skill-contract.test.ts tests/pipeline-review-contract.test.ts(70 pass) — covers the apply model (report-onlymode:agentvs interactive self-apply, commit-on-clean-tree, never push), the validation cap (P0/P1 never dropped), the terse-cell + keyed-detail-line format, stable numbering across Applied/severity/Actionable, and nomode:autofix/ no deny-listbun run release:validatein synctestcheck on PR